Synonym Acquisition Using Bilingual Comparable Corpora
نویسندگان
چکیده
Various successful methods for synonym acquisition are based on comparing context vectors acquired from a monolingual corpus. However, a domain-specific corpus might be limited in size and, as a consequence, a query term’s context vector can be sparse. Furthermore, even terms in a domain-specific corpus are sometimes ambiguous, which makes it desirable to be able to find the synonyms related to only one word sense. We introduce a new method for enriching a query term’s context vector by using the context vectors of a query term’s translations which are extracted from a comparable corpus. Our experimental evaluation shows, that the proposed method can improve synonym acquisition. Furthermore, by selecting appropriate translations, the user is able to prime the query term to one sense.
منابع مشابه
Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora
Within the framework of translation knowledge acquisition from WWW news sites, this paper studies issues on the effect of cross-language retrieval of relevant texts in bilingual lexicon acquisition from comparable corpora. We experimentally show that it is quite effective to reduce the candidate bilingual term pairs against which bilingual term correspondences are estimated, in terms of both co...
متن کاملLearning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach
Recent years saw an increased interest in the use and the construction of large corpora. With this increased interest and awareness has come an expansion in the application to knowledge acquisition and bilingual terminology extraction. The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, combination to linguisticsbased pruning a...
متن کاملExtracting Bilingual Lexica from Comparable Corpora Using Self-Organizing Maps
This paper aims to present a novel method of extracting bilingual lexica from comparable corpora using one of the artificial neural network algorithms, self-organizing maps (SOMs). The proposed method is very useful when a seed dictionary for translating source words into target words is insufficient. Our experiments have shown stunning results when contrasted with one of the other approaches. ...
متن کاملBilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval
The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, phrasal translation as well as evaluations on Cross-Language Information Retrieval. A two-stages translation model is proposed for the acquisition of bilingual terminology from comparable corpora, disambiguation and selection of best translation alternatives according to their...
متن کاملWord Sense Acquisition from Bilingual Comparable Corpora
Manually constructing an inventory of word senses has suffered from problems including high cost, arbitrary assignment of meaning to words, and mismatch to domains. To overcome these problems, we propose a method to assign word meaning from a bilingual comparable corpus and a bilingual dictionary. It clusters second-language translation equivalents of a first-language target word on the basis o...
متن کامل